1 Exploratory Data Analysis (EDA)

This document presents an EDA of elements’ concentration data (named as all_data.rds), which provide information about the concentration of critical mineral across different areas in Australia that has been normalised using PAAS (Post-Archean Australian Shale) standard. The primary objective of this analysis is to gain a deeper understanding of the data’s structure and key characteristics, which will inform subsequent modeling and decision-making processes. Through this analysis, we aim to identify significant trends, correlations, and outliers that may influence the outcomes of the study.

1.1 Load the Data

Our data comprises of 11032 observations and 8 variables. Most of the variables are character type, except for variable Element_Value_ppm, PAAS_value_ppm, and PAAS_normalised_value that are numeric type.

## Classes 'data.table' and 'data.frame':   11032 obs. of  8 variables:
##  $ Project_Name         : chr  "Collingwood Park" "Confidential_B" "Confidential_B" "Confidential_B" ...
##  $ Sample_ID            : chr  "CP-014" "ICP23000472Z291" "ICP23000472Z292" "ICP23000472Z293" ...
##  $ Element_Symbol       : chr  "Ag" "Ag" "Ag" "Ag" ...
##  $ Element_Value_ppm    : num  0.13 0.14 0.11 0.11 0.11 0.11 0.11 0.11 0.15 0.5 ...
##  $ Element_Description  : chr  "Silver" "Silver" "Silver" "Silver" ...
##  $ PAAS_value_ppm       : num  0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
##  $ PAAS_normalised_value: num  2.6 2.8 2.2 2.2 2.2 2.2 2.2 2.2 3 10 ...
##  $ Above_PASS_flag      : chr  "Enriched above background" "Enriched above background" "Enriched above background" "Enriched above background" ...
##  - attr(*, ".internal.selfref")=<externalptr>

The table ?? provide a preview of the data from top of 10 observations.

Project_Name Sample_ID Element_Symbol Element_Value_ppm Element_Description PAAS_value_ppm PAAS_normalised_value Above_PASS_flag
2 Collingwood Park CP-014 Ag 0.13 Silver 0.05 2.6 Enriched above background
15 Confidential_B ICP23000472Z291 Ag 0.14 Silver 0.05 2.8 Enriched above background
16 Confidential_B ICP23000472Z292 Ag 0.11 Silver 0.05 2.2 Enriched above background
17 Confidential_B ICP23000472Z293 Ag 0.11 Silver 0.05 2.2 Enriched above background
18 Confidential_B ICP23000472Z294 Ag 0.11 Silver 0.05 2.2 Enriched above background
19 Confidential_B ICP23000472Z299 Ag 0.11 Silver 0.05 2.2 Enriched above background
20 Confidential_B ICP23000472Z300 Ag 0.11 Silver 0.05 2.2 Enriched above background
21 Confidential_B ICP23000472Z301 Ag 0.11 Silver 0.05 2.2 Enriched above background
22 Confidential_B ICP23000472Z302 Ag 0.15 Silver 0.05 3.0 Enriched above background
121 Confidential_C IP23005174R1046 Ag 0.50 Silver 0.05 10.0 Enriched above background

1.2 Descriptive Statistics

This section provides some generic statistics information about the data, such as min, max, mean, median, and others. The table 1.1 lay out the details about these informations for each element.
Table 1.1: Table 1.2: Descriptive Statistics of Elements’ Concentration
Element_Symbol Element_Description min max mean median range q1 q3 iqr sd var
Ag Silver 0.100 0.660 2.070968e-01 0.180 0.560 0.1100 0.2725 0.1625 1.194713e-01 1.427340e-02
Al Aluminium 7.210 680000.000 6.757925e+04 35000.000 679992.790 17000.0000 103000.0000 86000.0000 7.859242e+04 6.176769e+09
Au Gold 0.001 0.022 6.337700e-03 0.006 0.021 0.0040 0.0080 0.0040 3.715600e-03 1.380000e-05
Ba Barium 7.000 10000.000 6.094583e+02 412.500 9993.000 189.7500 653.2500 463.5000 1.114878e+03 1.242953e+06
Be Beryllium 0.100 18.000 3.377132e+00 2.000 17.900 1.0000 4.0000 3.0000 3.400535e+00 1.156363e+01
Bi Bismuth 0.100 3.200 4.680357e-01 0.305 3.100 0.2000 0.6000 0.4000 4.193643e-01 1.758664e-01
Cd Cadmium 0.010 0.720 1.245798e-01 0.090 0.710 0.0500 0.1850 0.1350 1.092668e-01 1.193920e-02
Ce Cerium 3.600 380.000 6.121111e+01 63.050 376.400 31.1250 81.3750 50.2500 3.806216e+01 1.448728e+03
Co Cobalt 2.000 134.000 1.540041e+01 10.000 132.000 6.0000 15.2500 9.2500 1.926813e+01 3.712609e+02
Cr Chromium 1.000 897.000 3.656522e+01 17.000 896.000 11.0000 29.0000 18.0000 8.695052e+01 7.560393e+03
Cs Caesium 0.110 31.600 4.914583e+00 4.245 31.490 2.4450 6.3575 3.9125 4.188090e+00 1.754009e+01
Cu Copper 1.000 255.000 4.202480e+01 47.000 254.000 16.0000 60.7500 44.7500 2.899111e+01 8.404847e+02
Dy Dysprosium 0.400 18.500 5.325476e+00 5.310 18.100 3.5000 6.7150 3.2150 2.964792e+00 8.789990e+00
Er Erbium 0.200 11.400 3.222421e+00 3.035 11.200 2.0000 4.0725 2.0725 1.875259e+00 3.516598e+00
Eu Europium 0.100 6.840 1.474484e+00 1.450 6.740 0.8000 1.9200 1.1200 8.523855e-01 7.265611e-01
Fe Iron 0.190 339126.000 1.659509e+04 6301.000 339125.810 2000.0000 15000.0000 13000.0000 3.732040e+04 1.392812e+09
Ga Gallium 1.300 52.300 2.198135e+01 21.950 51.000 12.9000 31.6250 18.7250 1.161978e+01 1.350192e+02
Gd Gadolinium 0.500 25.100 6.038294e+00 5.750 24.600 3.6000 7.5550 3.9550 3.424170e+00 1.172494e+01
Ge Germanium 0.140 70.000 1.197000e+01 0.550 69.860 0.2925 15.2000 14.9075 2.084941e+01 4.346978e+02
HREE Ho+Er+Tm+Yb+Lu 1.100 30.600 8.670528e+00 8.200 29.500 5.4625 10.7525 5.2900 4.811790e+00 2.315332e+01
Ho Holmium 0.100 3.800 1.081349e+00 1.045 3.700 0.7000 1.3550 0.6550 6.133162e-01 3.761567e-01
In Indium 0.020 0.360 7.167890e-02 0.050 0.340 0.0300 0.1000 0.0700 5.584380e-02 3.118500e-03
LREE La+Ce+Pr+Nd+Pm+Sm 11.700 554.500 1.342750e+02 140.060 542.800 81.1000 180.3000 99.2000 7.105354e+01 5.048606e+03
La Lanthanum 3.000 76.200 2.612863e+01 27.900 73.200 12.0000 36.0000 24.0000 1.440875e+01 2.076121e+02
Li Lithium 5.000 285.000 4.772432e+01 40.000 280.000 15.0000 64.5000 49.5000 3.948586e+01 1.559133e+03
Lu Lutetium 0.000 2.000 4.928685e-01 0.460 2.000 0.3000 0.6100 0.3100 2.941315e-01 8.651330e-02
MREE Eu+Gd+Tb+Dy+Y 2.700 135.100 4.194795e+01 41.410 132.400 24.7000 52.3300 27.6300 2.328776e+01 5.423200e+02
Mn Manganese 1.000 11230.000 3.562388e+02 65.500 11229.000 19.0000 220.5000 201.5000 1.130209e+03 1.277371e+06
Mo Molybdenum 0.100 20.600 4.455838e+00 4.000 20.500 2.0000 5.0000 3.0000 3.291799e+00 1.083594e+01
Nb Niobium 0.500 42.800 7.465746e+00 7.765 42.300 4.3675 9.4175 5.0500 4.824231e+00 2.327321e+01
Nd Neodynium 1.700 115.000 2.977718e+01 31.750 113.300 14.7500 39.5250 24.7750 1.675822e+01 2.808380e+02
Ni Nickel 1.000 360.000 1.697179e+01 7.000 359.000 5.0000 13.0000 8.0000 3.825303e+01 1.463294e+03
Pb Lead 0.890 83.450 2.168821e+01 21.000 82.560 11.0600 28.0000 16.9400 1.454362e+01 2.115168e+02
Pr Praseodymi 0.400 33.400 7.335079e+00 7.750 33.000 3.5000 9.8475 6.3475 4.234981e+00 1.793507e+01
REE La+Ce+Pr+Nd+Sm+Eu+Gd+Tb+Dy+Ho+Er+Tm+Yb+Lu 19.600 611.000 1.593275e+02 165.430 591.400 107.1000 205.5100 98.4100 7.889068e+01 6.223739e+03
REEY La+Ce+Pr+Nd+Sm+Eu+Gd+Tb+Dy+Ho+Er+Tm+Yb+Lu+Y 23.200 613.000 1.885760e+02 193.850 589.800 124.1000 237.1300 113.0300 8.895750e+01 7.913437e+03
Rb Rubidium 0.160 299.000 5.291030e+01 49.400 298.840 14.3000 77.8250 63.5250 4.440035e+01 1.971391e+03
Re Rhenium 0.000 0.003 1.714300e-03 0.002 0.003 0.0010 0.0025 0.0015 1.253600e-03 1.600000e-06
Sc Scandium 2.200 67.800 1.546524e+01 16.200 65.600 9.3250 19.6750 10.3500 8.300461e+00 6.889765e+01
Sm Samarium 0.400 21.100 6.393611e+00 6.570 20.700 3.4500 8.4850 5.0350 3.531674e+00 1.247272e+01
Sn Tin 1.000 13.000 3.990533e+00 3.600 12.000 2.7000 4.8000 2.1000 1.924752e+00 3.704672e+00
Sr Strontium 2.000 1600.000 3.341817e+02 320.500 1598.000 170.0000 467.7500 297.7500 2.128948e+02 4.532421e+04
Ta Thallium 0.100 3.000 6.490099e-01 0.690 2.900 0.4375 0.8000 0.3625 3.613780e-01 1.305940e-01
Tb Terbium 0.100 2.930 8.821429e-01 0.870 2.830 0.6000 1.1000 0.5000 4.861272e-01 2.363197e-01
Th Thorium 0.470 57.020 1.254171e+01 11.850 56.550 5.6400 16.2000 10.5600 9.130136e+00 8.335938e+01
Tl Tantalum 0.030 10.000 1.941964e+00 0.715 9.970 0.3625 1.5100 1.1475 3.123047e+00 9.753423e+00
Tm Thulium 0.100 1.800 4.793496e-01 0.450 1.700 0.3000 0.6000 0.3000 2.693370e-01 7.254240e-02
U Uranium 0.150 12.000 3.609700e+00 3.650 11.850 1.6250 5.0175 3.3925 2.254537e+00 5.082937e+00
V Vanadium 2.000 460.000 1.110602e+02 118.000 458.000 50.0000 150.0000 100.0000 7.121341e+01 5.071349e+03
Y Yttrium 1.000 100.500 2.811004e+01 27.500 99.500 17.0000 35.9000 18.9000 1.681139e+01 2.826228e+02
Yb Ytterbium 0.200 11.900 3.217143e+00 3.080 11.700 1.9750 4.1025 2.1275 1.879058e+00 3.530860e+00
Zn Zinc 1.000 307.000 6.537751e+01 66.000 306.000 17.0000 101.0000 84.0000 4.964529e+01 2.464655e+03
Zr Zirconium 4.000 916.000 1.753635e+02 186.000 912.000 93.7500 228.0000 134.2500 1.125054e+02 1.265747e+04

1.3 Distribution Analysis

Distribution of Critical Elements

Figure 1.1: Distribution of Critical Elements

1.4 Distribution of elements with reference to PASS levels

In this analysis, we are trying to assess all critical elements towards the PAAS standard. To begin with, we start from a high-level distribution across the two main categories that we used to identify which elements that are above and below standard. The normalised value will be flagged as “Enriched above background” if it is above 1, while the rest will be flagged as “Below background”. The figure 1.2 provide the details about this high-level distribution.

The Profile of PASS Categories

Figure 1.2: The Profile of PASS Categories

As depicted in the bar chart, the distribution appears fairly balanced between the two categories, with 5,724 instances classified as “Enriched Above Background” and 5,355 instances falling “Below Background”. Such a balance highlights the importance of further detailed analysis to understand the factors contributing to this distribution, the significance of enrichment in the context of the dataset, and how these elements behave under different conditions.

Moving on to the element’s level, we will assess how each element’s concentration level compares to the PAAS standard. For each elements below, the value shown is the normalised, meaning we have incorporated the standard value towards the element’s original values (based on sample taken). Figure 1.3 shows the profile of each sample towards this standard and their respective flags.
Distribution of elements with reference to PASS levels' Concentration

Figure 1.3: Distribution of elements with reference to PASS levels’ Concentration

As can be seen, the majority of normalised value fall within 0 to 10, while the rest are can be counted as outliers. A highlight point from this plot is all of ‘Ag’ concentration values are above PAAS standard. This could indicate that coal waste are rich in ‘Ag’.

Additionally, we also analyse how the distribution of critical elements across various Project Area. The figure 1.4 shows how many elements that was recorded in each project area that fall into these two categories.
The Profile of Project Area with reference to PASS levels' Concentration

Figure 1.4: The Profile of Project Area with reference to PASS levels’ Concentration

Some key observations:

  • The “Confidential_C” project stands out with a substantial number of elements (3,358) categorised as “Enriched Above Background”, significantly outnumbering the “Below Background” category count (1,192). This suggests a considerable concentration of elements that exceed PAAS standard in this project.
  • Project such as “Fort Cooper”, “Confidential_B” and “Confidential_A” display a more balanced distribution between two categories, indicating a near-equal mix of elements that either meet or fall short of the enrichment criteria.
  • In contrast, several projects, such as “Wandoan”, “Copabella”, “Lake Vermont”, “Collinsville”, “Moorvalle”, “Newlands”, “Metropolitan”, “Rolleston” exhibit a higher count of elements classified as “Below Background”. This implies that these projects have a significant proportion of elements that do not reach the enrichment threshold.
  • A few projects, including “Oaky Creek”, and “Unnamed”, contribute minimally to the overall dataset, with very few elements categorised in either categories.
Distribution of Critical Elements by Each Project Area

Figure 1.5: Distribution of Critical Elements by Each Project Area

1.5 Correlation Matrix plot

Correlation Matrix Plot of Critical Elements

Figure 1.6: Correlation Matrix Plot of Critical Elements

The 1.6 shows the relationships between various elements. Some key observations are:

  • Strongly positive correlations: There are numbers of elements that have very strong relationship (above 0.95), such as Er-Dy, Gd-Eu, Ho-Dy, Ho-Er, Pr-Nd, Sm-Nd, Tb-Dy, Tb-Gd, Tb-Ho, Yb-Er, and Yb-Ho. There are also many more elements that have correlation above 0.8.
  • Moderate correlations: Many elements fall between 0.6-0.8, which indicate fairly strong relationships.
  • Weaker Correlations: Elements such as Ba show weaker correlations with most other elements, indicating less consistent co-occurrence or independent behavior within the dataset.This also the case for Sr.

1.6 Scatter Plot

1.6.1 Correlation of Dy

1.6.2 Correlation of Er

1.6.3 Correlation of Eu

1.6.4 Correlation of Ga

1.6.5 Correlation of Gd

1.6.6 Correlation of Ho

1.6.7 Correlation of Nd

1.6.8 Correlation of Pb

1.6.9 Correlation of Pr

1.6.10 Correlation of Rb

1.6.11 Correlation of Sm

1.6.12 Correlation of Tb

1.6.13 Correlation of Th

1.6.14 Correlation of Yb

1.6.15 Correlation of Zr